03 Fine-tuning

FTはプロンプトエンジニアリングとは別のテクニック

コンテキストブロックをハック（？）

定義

continuing the training process on a smaller, domain-specific dataset to optimize a model for a specific task

ベースモデル -> 全く別のモデル

一般的なモデル（gpt-3.5, 4）のチューニング

general knowledge

benefit

Improve model performance on specific task

モデルのコンテキストウィンドウに対して多くのデータを与えられる

例を提示できる

Improve model efficiency

リクエストごとのトークン数が少なくなる

レイテンシが改善

distill (GPT-4から3.5)

（FSLはfew-shot learning）

fine-tuningの例：構造化

FTしない場合

出力（JSON）のスキーマを定義

プロンプトでExampleを渡す

code:complex instruction

System

Your role is to assist in the extraction of structured information from text. Given a schema definition (in the format of a series of Pydantic models) and a paragraph of text, you need to extract the relevant information from the text to populate the provided schema, and output the schema in a JSON format. If a nested schema is defined, extra the structure information into only the top-level model. Ensure all requires fields are filled out to the best of your ability. Do not output anything except the valid JSON object.

Here is the schema definition for a real estate listing

class Feature(BaseModel):

bedrooms: int

bathrooms: int

square footage: int

lot_size: float

class RealEstateListing(BaseModel):

agent_id: int

property_type: str

address: str

listing_date: date

price: float

features: Feature

status: str

historical_prices: List[Tupledate, float]

ExampleとしてUserとAssistantが続く

https://youtu.be/ahnGLM-RC1Y?si=w-PDqn6dc4xXGbF1&t=1606 参照

生成されたJSONにはmistakeがある

fine-tuningではdatasetを用意

User（テキスト）、Assistant（JSON形式出力）のペア

スキーマは正解として渡している

fine-tuningしたモデルは、formatの指定が不要になる

適する

emphasize knowledge that already exists in the model

モデルの知識のサブセットを強調

customizing the structure or tone of responses

teaching a model very complex instructions

コンテキストウィンドウに複雑な指示を詰め込むより有効

適さない

新しい知識の追加

RAGを考える

新しいユースケースに急ぎ反復する

フィードバックは遅くなる（データセットづくりなど）

if prompt engineering isn't helping, fine-tuning likely isn't right for your use-case

サクセスストーリー（Canva）

自然言語でデザイン記述

モデルは設計ガイドラインを生成

人間が評価（0-2）

fine-tuned GPT-3.5がGPT-4を上回った

なぜ機能したか

新しい知識は不要だった

特殊な出力構造の要求

訓練データがあった

GPT-3.5, 4での評価でどこで失敗しどこで成功したか理解していた

別の例（tale）

文章生成（執筆アシスタント）、トーンを著者の書き方に調整したい

2年分 Slack 140k messages

Slackの書き方を再現してしまった

「Write a 500 word blog」Slackだから「朝やる」（33:30）

「いまお願いします」「ok」

fine tuneに使ったデータが最終的にほしいトーンに関係するか

Eメールやブログポストに変えた

Steps（手順）

データセット

training

OpenAIのAPI

OSSのモデルはGPUで

ハイパーパラメタ選べる（理解が必要。影響まで。過剰適合？壊滅的忘却？）

loss関数への理解

次のトークン予測では、下流タスクに影響しない（と言っている？）

evaluation

テストセット

複数モデルの出力にランキングを付けるという方法もある

デプロイしてinference

新しいデータセットを取得できる

ベストプラクティス

プロンプトエンジニアリングとFSL（few-shot）からはじめよ

小さい投資

ベースラインを確立せよ

比較するため（Canvaの例）

Start small, focus on quality

データセットの構築は難しい

少量の高品質データ

問題領域を見つけ、新しいデータで手を打つ

fine tune & RAG

いいとこどり

1. 複雑な指示も理解できる（fine-tune）

2. プロンプトエンジニアリングのテキスト減らせる（＝retrievalに使える）

3. RAGで関連する情報を追加（2で生まれたコンテキストウィンドウの隙間に）